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ABSTRACT 



Conventional canonical methods distinguish between the 

two variable sets being analyzing, but . the me'thods do not 

attempt to optimize the variance from a given variable set 

that will be contained in the final solution. In this respect 

* 

canonical methods are said the be "symetr ic. " The. paper 
proposes two non-symetr ic, canonical-like techniques that can 



be employed when theoretical or utility considerations suggest 



two methods meet are discussed, as are various software 




that one variable set (usually the qr^terion set) should be 
emphasized over the other variable set. The criteria that the 



considerations. 



KerUnger .(1973, p. 652) has suggested that "it is not 
easy' to find research studies that have used canonical 
analysis. In earlier years, of course, the calculations 
involved were prohibitive. Today, even with computer 
facilities and programs available, the method is evidently not 
well known. This is regrettable, because some research 1 
problems almost demand 'canonical analysis," More recently, 
Thorndike (1977, p. 76) expressed similar sentiments: "Given 
the substantial theoretical literature on canonical analysis, 
it is surprising * t;o find that the technique has seen 
relatively infrequent use by researchers studying substantive 
problems. Instances where the methods of ~can<?nical analysis, 
'have been applied rather than .studied for their own sake are 
relatively, rare* (but on the increase)." Still more recently, 
Kettenring (19S2, p. 355) noted that "since canonical analysis 
is usually considered to be one of the 'major methods' of 
multivariate analysis, it \s perhaps^ surprising that this ♦ 
technique does not in fact play a larger roLe in data 
analysis." ' * 

. Several possible causes ^for a reticience to apply 
canonical methods have been cited, including the complexity of 
canonical mathematics (Thompson, 1982, p. 467) and 
difficulties in interpreting canonical results (Thompson, 
1980a, pp. 16-17). v However, Cronbach (1971, pp. 489-490) tifes 
argued that "statistical devices such as canonical 



/ 

J 

« 

^ r Page 2 

correlation, for handling handling several- predictors and 

\ • 

several criteria simultaneously, are not appropriate for the, 
decision-oriented study... Utility depends upon values, hot 
upon the statistical connections of scores." Kettenring (1982, 
p. 365) made a similar point: ' "To achieve its potential, 
better methods are needed for selecting 'canonical variables' 
which have practical as well as theoretical interest and fos 
making statistical inferences about them." These views stand 
somewhat in contrast with Levine's (1977, p. 8) position that 
"especially witl} respect to canonical correlation, there seem 
to be relatively few remaining puzzles to be solved." 

* 

The purpose of this paper is to present, an extension ^of 
those canonical methods which have traditionally been 
available to researchers. The extension will be discussed in 
the context 'of a concrete heuristic example. However, some 
discussion -of more conventional canonical methods (Hotelling, 
1935) is -required to form the framework for the presentation. 

* 

The Logic of Canonical Analysis 
Canonical correlation analysis is employed to study 
relationships between two variable sets when each variable set 
consists of at least two variables. Thus, Table 1 presents 
the data for what is th^ simplest canoriical case, since only 
two criterion variSkles, X and Y, and only "two predictor 
variables, A and B, are involved. Of course, these data hre 
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presented only for the purposes of discussion, since the 
hypothetical sample size (n ='10)^ is absurdly small. 

1 • INSERT TABLE 1 'ABOUT ' HERE . 

«* , 

The first "step in a .canonical correlation analysis- 

involves ' the calculation -of the intervariable correlation 

matrix. % A symetric matrix of reduced rank equal to the number 

of variables in the smaller of. the two variable sets is then" 

derived from the intervariable correlation matrix (see Co'oley 

& Lohnes, 1971, p. .176 for details). This, matrix^ is presented 

in Table 2. The eigenvalues of the Table 2 ■ matrix ' (as* some^ 

readers' may wish to verify'- v by sifb jeering: tjhe matrix to,^a 

principal components analysis), /each represent a ' squar^d s 

canonical correlation coefficient:. Since \the< number pf 

eigenvalues" which can be calculated -for such a matri>£ equals 

the ' number of ^rows (or columns) in 'the matrix,, it should 'b^e 

clear that the maximum number of canonical correlation 

' ( / 4 - • \ <, 

coefficients^ which can be derived for' a -data set equals^he* 

number of variables in the smaller of the two variable' sets. 

Because in this case both- variable s^ts consist of two 

I ( « « 

variables, only two canonical correlation coefficients - can . be 
calculated. „ • , * • 

INSERT TABLE 2 ABO UT . HERE. - * * 

A squared canonical correlation coefficient rridicates the 
proportion of variance which two composites derived from the 
two variable, sets linearly share. The composites are derived 
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by multiplying the Z scores of each person on each ' variable by 
the corresponding canpnical function coefficients. Thus, a 
canonical correlation --coefficient v is simply the Pearson 
product-moment bivariate correlation between two c linear 
comppsites derived from the two variable sets. 

I 

Table' 3 provides an illustration of these calcula t ioiis^ 
The , table only presents the coefficients for the first 
canonical function, although it has already been noted that a 

% second function could have been calculated. The canonical 
Correlation is the bivariate correlation* between the two table 
columns • headed * "Criterion CpmposiCe" and "Predictor^ 
Composite." These 10 pairs of * values are plotted in the Figure 
1 scattergram. The figure also presents the regression line 
for the two sets of 10 composites. Since the composites are 
themselves also Jin Z score form, the regression line passes 

c through 'the mean if both composites, i.e., the X-Y intercept, 
andi the -line's slope (.305) equals the bivariate correlation 
between the two ^ompo.sites and also the canonical correlation 

between the two /Variable sets. 

- < / 

INSERT TABLE 3 AND FIGURE 1 ABOUT HERE. 




Some- Non-symetric Canonical-like Me thods 
' • As the previous discussion suggests, conventional 
canonical methods derive equation weights which optimize the 
h correlations between the canonical composites. Conventional 
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methods thus give similar consideration to.both variable sets 

when deriving funct ion' coef f icients, i.e., the methods 

distinguish betwee* the predictor and the. cri tet ion variable - 

sets but do not emphasize one variable set over the other. In 

this respect conventional canonical methods may be said to be. 

• /j • 

"symetric." These features of conventional canonical analysis 

• ' , s * 

are disturbing in research situations where the researcher may 

wish to emphasize one variable set over the other. Two 

"nqn-symetric" methods of attending to variable set, 

distinctions will be presented he-re in the context of a 

\ 

concrete heuristic example. 

The correlation matrix presented in the bottom triangle 
of the Table 4 matrix prdvides the heuristic. ■ The data 
(n = 235) were reported by Thompson (1980b) and involved two 
^^sets of variables respect ively of size four % and 10. Table 5 
presents the results obtained by analyzing the data with 
conventional, syjiietric canonical methods. The sum pi the four 
squared canonical correlation coefficients for the .Table 5 
results was .593. Since conventional functions are 
uncorrelated, the 'squared correlations can be added ^ to 
determine the cumulative proportion <$f information shared by 
all * the possible canonical composites that could be derived 
from a data set in which the smallest variable set contained 
four variables. * 

INSERT TABLES 4 AND 5 ABOUT HERE. ■»• s 
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Suppose, howevef, -that, the researcher wishes to derive 
function weights for 'the 10 criterion variables subject to fehe 
restriction that all of the variance in the criterion 
variables must be represented' in the final solution. One 
possible way to do this' is to employ "criterion variable 
weights such as those presented in Table 6, These weights 
imply that _y multiple regression analyses were -conducted 
separately for "each of m the y criterion variables.-^ 
. * INSERT TABLE 6 ABOUT 'HERE. 



• The squared multiple correlations from Table 6 sum to 
.660. However, this result overestimates the cumulative 
portion of information shared by the score composites, because 
the Table 6 functions are .not uncorrelated. The argument that 
these, functions are correlated should seem reasonable since 
the criterion variable weights were designated without 
'considering the correlations among the variables. 

However, ft is possible to provide a non-symetric method 
which (a^ employs all the variance in the criterion variables 
(presuming, as will usually be the case, that this is the 

variable set of the most interest to the researcher), (b) 

v 

sheds light on the structure underlying the correlations among 
the criterion variables, and which (c) allows uncorrelated 
functions to be defined, at least when the predictor variables 
are uncorrelated. The' method requires three steps. First, 
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the intradomain cri ter ion correlations are subjected to a 
full-rank, .principal components ' analysis, using a readily 
available routine^ frofh a computer package such as SPSS 
(TYPE=PAl), By definition, a full-rank, principal components 
solution contains all of the variance N pre;sent in the *y 
criterion variables. Second, the orthogonal components are \ 
rotated to some orthogopal criterion (in th^s case Varimax), 
and factor scores are calculated (scores are requested in SPSS 
with l the inclusion of "FACSCORE" on the procedure card). 
Finally, correlation coefficients among these factor scores 
and tHe predictor variables are computed (see upper triangle \ 
of the Table 4 matrix) and all possible y multiple regression 
equations are calculated. Again, routines for performing this 
step of the analysis are readily available. The results 
produced in this manner' for the heuristic data are presented 
in' Table 7. The function weights for the criterion variables 
are the "factor score coefficients" (obtained from SPSS during 
'the principal components step of the analysis merely by 
requesting STATISTIC 7) derived by pos tmult iply ing ' the 
inverted intradomain criterion variable * correlation 
coefficients by the rotated principal components matrix?- the 
function coefficients for the predictor variables are the beta *Q 
weights derived from the regression analysis. 

INSERT TABLE 7 ABOUT HERE. 
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< » 

Since in this case the four predictor variables happened 
to be uncorrelated, ,and since this non-symetr ic procedure 
always produces uncorrelated cr i ter ion composites* the 

functions • themselves are Uncorrelated. Thus, the sum of the 

«» 

* 

squared multiple correlations w&s . 593, just as the sum of the 

^squajred canonical correlation coefficients was also .593 when 

conventional canonical functions were computed. 

% 

A second a'nd even mc5re .elegant non-symetr ic method can be 
proposed. This method (a) employs all the variance in the 
criterion variables, (b) sheds light on, the structure 
underlying the sorrela tions among the criterion variables, (c) 
allows orthogonal functions to be . defined, and (d) is 
"confirmatory" in that a pripri expectations regarding the 

criterion variables are considered. This last element is 

J . . 

responsive to Cronbach's (1971) previously mentioned utility 

concerns and can minimize the extent to which the solution' 

capitalizes qn sampling error. However, unlike the previously 

discussed procedures, this\ second method can not be 

implemented solely with the use of widely available 
$ 

statistical packaged ?uck as SPSSj 



The procedure „ requires four steps. ' First, the 

t 

intradomain criter-ion correlations are subjected to a 
full-rank principal components analysis. Second, the 
components are rotated to a position of "best fit" with an a 
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priori defined "tarj^t" matrix (for an example see Thompson & 
Pitts, 1981/82), typically consisting of ones', zeroes, 4 and 
negative ones. This rotation can be performed using the 
computer program provided by Veldm^n (1967). Third, factor 
scores are computed using the least squares algOrhythm: 

-i ' 

Z ' R C - = F 
NxV VxV VxF NxF 

where Z refers to the V scores of the N people in the scores' 

I standardized form, R is the intervariable correlation ^matrix, 

' and C is the principal components matrix derived in the 

previous step. fc The -calculation of these factor scores can be 

facilitated at mfcfet computing facilities by t#e use fairly 

"user friendly" utility packages such as tMSL. Finally/ the 

correlation coefficients among the y factor scores and the 

predictor variables are computed and all possible y multiple 

regression equations are computed. » 

Ntujt • Discussion , , 

j 

The non-symetric methods suggested here can provide,' at 

» 

least in some instances, both substantive ancf hefuristic 
benefits when compared with conveo t'ional canonical methods. 
The non-symetric methods augment several very 'Jh'elpful 
extensions of conventional canonical methodology, including 
most notably stepwise techniques (Rim, 1972; Thompson, 1982), 
part and partial canonical methods (Lee, 1978), and redundancy 
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jnalysis '(van <?en Wollenberg, 1977). Of course, when the 
jion-symetric methods are applied for the purposes * -of -actual 
Substantive inquiry, it" becomes important to supplement the 
analysis by the computation ^f tHe variables'' ^'structure 
coefficients (see Levine, 1977, p. 20). The reader is also 
cautioned that experimentwise Type I error rates are inflated 
by the use of traditional regression test statistics with the' 
non-symetric methods. However, when • .the functions are 
uncorrelated, as they were ^ here, the exact experimentwise 
error rate would be: alpha* = 1 - k, where k equals 
((1 - alpha) raised^ to* the y power), where y * s the number of 
criterion variables. A reasonable approach in such a case 

' might be to t test each function at the alpha/y level of 

I statistical' signif icance* 



i 



The non-symetric methods presented here will be most 
helpful when. there are clear theoretical distinctions between 
the predictor and the criterion variable sets, and when the 
i. research situation implies thaK optimizing the'variance of the 
■ criterion variables is at least as important as optimizing the 
correlation between the variable sets' composites. The 
non-symetric methods will also be most appropriate when there 
is a definite interest in exploratory (the varimax rotated 
solution) or "confirmatory " (the "best fit" solution) 
investigation of the structure underlying' the , the criterion 
- variables. . Thus, the techniques will be particularly potent 
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when the criterion variables are themselves correlated (as* 

s 

they tended not' to be in the heuristic example presented 

here), because then a few composites can contain the 

# 

preponderance of the criterion variables' variance, and a more 
parsimonious solution will result. 

But the non-symetric methods are ' also valuable ,to the 
extent that they may help to* demystify conventional canonical 
methods. It has been noted that canonical analysis is 
essentially a principal " components analysis of a particular 
matrix derived from the intervariable correlation matrix 
(e.g., Table 2). Similarly^ the non-symetric methods relied 
heavily upon the use of principal compor^nts analyses. Thus 
it can be suggested that the symetric and the non-symetric 
methods are somewhat analogous. These conceptual linkages 
among the techniques merely suggest, as Knapp (1978) has 
shown, that canonical methods represent a most-general 
data-analytic system, and that canonical methods subsume all 
parametric s'tatis trcal techniques. 
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Table 1 
Hypothetical Data Set 



f -J c A 


X 


Y 
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B 
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l(-0 
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9( + l 
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4(+0 08) 


6(+0 


.33) 
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5 (+1 
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4(-0 
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Of -1 .02) 


8(+0 
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3 ' 


3(+0 


.59) 


9(+l 
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0(-l 


.33) 


4 
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9( + l 
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.33) 
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2(-0 


.76) 


9(+1.46) 


0(-l 


.33) 
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2(-0 


.07) 


0(-l 


.36) 


2(-0.47) 


9( + l 


.16) 


8 


0(-l 


.38) 

• 


2(-0 


.76) 


0(-1.02) 


5(+0 


.06) 


9 


0(-l 


.38) 


9( + l 


.36) 


K-0.74) 


• 6(+0 


.33) 


10 


2(-0 


.07) 


3(-0 


.45) 


0(-1.02) 


5(+0 


.06) 



Note : Z score equivalents of the unstandardize'd data are 

i 

presented in parentheses. ,^ 



Table 2 
Analyzed Matrix 
.086 .011 
.048 .027 
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. Table 3 
* Calculation of Conposites j 

* . » 

Criterion Predictor 



Case Fl ZY F2 Coirposite Conposite ZA FT' ZB F4 

1 (-0.72) (Fl )+(+l. 36 )(F2) =.-0.97* 40.42 = (+0.08) (F3)+(+0. 33) (F4) 

V 

2 (+1.90)(Pl)+(-0.l5)(F2) = 41.82 y -0.49 = (-1.02) (F3)4(40.88) (F4) 

3 * (40.59) (Pl)+(+l. 36) (F2) = 40.26 J -0.43 = (40.64) (F3)+(-l. 33) (F4) 

4 ( 40.59) (Fl)4(-0. 15) (F2) = 40.59 \ 41,92"= (+0.64) (P3)'+(+l. 16) (F4) 
■ 5 (+0.59) (Fl)+(-0.45)(F2) = 40.65 40.64^(41.46) (F3)+(-l. 33) (F4) 

6 (-0.07HFl)+(-0.76)(F2) = 40.10 40.64 = (+1.46) (F3)+M..33) (F4) 

7 (-0.07)(Fl)+(-1.36)(F2) = +0.23 1+0.49 = (-0.47) (F3)+(+l.l6) (F4) 

8 (-1.38)(Fl)+(-0.76)(F2) = -1.14 >f 1.27 = '(-1.02) (F3)+'(+0.06) (F4) 
'9 (-1.38) (Fl)+(+1.36) (F2) = -1.59 -0.65 = (-0.74) (F3)+(+0. 33) (F4) 
10. ( -0.07 )(Fl)+( -0.45 )(F2) = +0.04 -1.27 = (-1.02) (F3)+(+0.06) (F4) 



Note ; The canonical function coefficients are respectively: Fl 
+0.94; "F2" = -0.22; "F3" = +1.30; "F4" = +0.94. 



Table 4 
Correlation Matrix 
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00 
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Table 5 
Canonical Results 
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Table 6 



Regression Solutions 
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- Figure 1 • 
Scattergram of Canonical Composites 
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